En el laboratorio se generan datos derivados de la lectura de actividad enzimática medida por fluorescencia, estos datos se tienen que reordenar para después realizar análisis estadísticos y finalmente generar gráficas como resultado final para el análisis de sus datos
Samples Desordenadas
Resolver el problema de carga
Estandarizar datos de entrada:
Figure 1: Samples A, B, C with 3 rep
Se ordena la matriz de entrada de acuerdo a las réplicas y las muestras.
Programa
import argparseimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltdef plot_time_vs_average(file_path, rep, sep, output_graph, sample_names): """ Plot Time vs Average per Sample of the measurements Plots the results of fluorescence measurements of liquid samples from 96-well plates (may be any desired matrix). This script reads a CSV file containing time series data and plots the average values per sample over time. Usage: python plot_data_96plate.py <file> [--rep <step_size>] [--sep <delimiter>] [--output <output_file>] [--sample-names <names>...] Arguments: file : Path to the input CSV file. It must contain the report of each of the samples pre-ordered like the example data --rep <number of replicates> : Number of replicates for sample, in the code it names "Step size for sub-coordinates" (default: 3). --sep <delimiter> : Delimiter used in the input file (default: ','). --output <output_file> : File name to save the output graph. --sample-names <names> : Custom names for the samples in the plot. Example: python plot_script.py data.csv --rep 3 --sep ';' --output graph.png --sample-names Blank Mutant_01 Mutant_02 Example Data: A1 A2 A3 A4 A5 A6 A7 A8 A9 Time 0:29:10 2004 1974 1942 1808 1799 1806 2526 1899 1899 0:59:10 1794 1819 1911 1722 1675 1734 2416 1738 1738 1:29:10 1845 1902 1871 1738 1822 1655 2354 1758 1758 ... """ # Read the file and retrieve the column names df = pd.read_csv(file_path, index_col=0, sep=sep) coordinates = df.columns.tolist() sub_coordinates_dict = {} for i in range(0, len(coordinates)-rep+1, rep): sub_coordinates = coordinates[i:i+rep] key = f'Sample_{i//rep + 1}' # Generate unique key sub_coordinates_dict[key] = sub_coordinates # Calculate average per sample on each time averages_per_time = {} for time, row in df.iterrows(): for key, sub_coordinates in sub_coordinates_dict.items(): sample_values = row[sub_coordinates] average = sample_values.mean() if time in averages_per_time: averages_per_time[time][key] = average else: averages_per_time[time] = {key: average} # Prepare data for plotting data = [] for time, averages in averages_per_time.items(): for sample, average in averages.items(): data.append({'Time': time, 'Sample': sample, 'Average': average}) # Convert data to DataFrame df_plot = pd.DataFrame(data) # Customize sample names if sample_names: sample_names_dict = dict(zip(sub_coordinates_dict.keys(), sample_names)) df_plot['Sample'] = df_plot['Sample'].replace(sample_names_dict) # Plotting sns.scatterplot(data=df_plot, x='Time', y='Average', hue='Sample') plt.xticks(rotation=90) # Export the plot if output_graph: plt.savefig(output_graph) plt.show()if __name__ == '__main__': # Create argument parser parser = argparse.ArgumentParser(description='Plot time vs average per sample.') # Add arguments parser.add_argument('file', type=argparse.FileType('r'), help='Input file path') parser.add_argument('--rep', type=int, default=3, help='Step size for sub-coordinates for the sample (default: 3)') parser.add_argument('--sep', type=str, default=',', help='Delimiter for input file (default: ",")') parser.add_argument('--output', type=str, help='Output graph file name') parser.add_argument('--sample-names', type=str, nargs='+', help='Customize sample names') # Parse arguments args = parser.parse_args() # Call the plot function with provided arguments plot_time_vs_average(args.file.name, args.rep, args.sep, args.output, args.sample_names)
Arguments: file : Path to the input CSV file. It must contain the report of each of the samples pre-ordered like the example data --rep <number of replicates> : Number of replicates for sample, in the code it names "Step size for sub-coordinates" (default: 3). --sep <delimiter> : Delimiter used in the input file (default: ','). --output <output_file> : File name to save the output graph. --sample-names <names> : Custom names for the samples in the plot.
Solo calcula el promedio de las réplicas y grafica
Lectura del código:
import argparseimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltdef plot_time_vs_average(file_path, rep, sep, output_graph, sample_names):""" Plot Time vs Average per Sample of the measurements Plots the results of fluorescence measurements of liquid samples from 96-well plates (may be any desired matrix). This script reads a CSV file containing time series data and plots the average values per sample over time. Usage: python plot_data_96plate.py <file> [--rep <step_size>] [--sep <delimiter>] [--output <output_file>] [--sample-names <names>...] Arguments: file : Path to the input CSV file. It must contain the report of each of the samples pre-ordered like the example data --rep <number of replicates> : Number of replicates for sample, in the code it names "Step size for sub-coordinates" (default: 3). --sep <delimiter> : Delimiter used in the input file (default: ','). --output <output_file> : File name to save the output graph. --sample-names <names> : Custom names for the samples in the plot. Example: python plot_script.py data.csv --rep 3 --sep ';' --output graph.png --sample-names Blank Mutant_01 Mutant_02 Example Data: A1 A2 A3 A4 A5 A6 A7 A8 A9 Time 0:29:10 2004 1974 1942 1808 1799 1806 2526 1899 1899 0:59:10 1794 1819 1911 1722 1675 1734 2416 1738 1738 1:29:10 1845 1902 1871 1738 1822 1655 2354 1758 1758 ... """# Read the file and retrieve the column names df = pd.read_csv(file_path, index_col=0, sep=sep) coordinates = df.columns.tolist() sub_coordinates_dict = {}for i inrange(0, len(coordinates)-rep+1, rep): sub_coordinates = coordinates[i:i+rep] key =f'Sample_{i//rep +1}'# Generate unique key sub_coordinates_dict[key] = sub_coordinates# Calculate average per sample on each time averages_per_time = {}for time, row in df.iterrows():for key, sub_coordinates in sub_coordinates_dict.items(): sample_values = row[sub_coordinates] average = sample_values.mean()if time in averages_per_time: averages_per_time[time][key] = averageelse: averages_per_time[time] = {key: average}# Prepare data for plotting data = []for time, averages in averages_per_time.items():for sample, average in averages.items(): data.append({'Time': time, 'Sample': sample, 'Average': average})# Convert data to DataFrame df_plot = pd.DataFrame(data)# Customize sample namesif sample_names: sample_names_dict =dict(zip(sub_coordinates_dict.keys(), sample_names)) df_plot['Sample'] = df_plot['Sample'].replace(sample_names_dict)# Plotting sns.scatterplot(data=df_plot, x='Time', y='Average', hue='Sample') plt.xticks(rotation=90)# Export the plotif output_graph: plt.savefig(output_graph) plt.show()if__name__=='__main__':# Create argument parser parser = argparse.ArgumentParser(description='Plot time vs average per sample.')# Add arguments parser.add_argument('file', type=argparse.FileType('r'), help='Input file path') parser.add_argument('--rep', type=int, default=3, help='Step size for sub-coordinates for the sample (default: 3)') parser.add_argument('--sep', type=str, default=',', help='Delimiter for input file (default: ",")') parser.add_argument('--output', type=str, help='Output graph file name') parser.add_argument('--sample-names', type=str, nargs='+', help='Customize sample names')# Parse arguments args = parser.parse_args()# Call the plot function with provided arguments plot_time_vs_average(args.file.name, args.rep, args.sep, args.output, args.sample_names)